Basics of Probability Theory $_{lec1}$

The Calculus of Probabilities

Probability operators

Probability properties

If $P$ $P$ is a probability function, $A$ $A$ and $B$ $B$ are any two sets in $B$ $B$ , then
- $P(\emptyset ) = 0$ , where $\emptyset$ is the empty set
- $P(A) \le 1$
- $P(B \cap A^c ) = P(B) - P(A \cap B)$
- $P(A \cup B) = P(A) + P(B) - P(A \cap B)$
- If $A \subset B$ , then $P(A) \le P(B)$
Events $A_1$ $A_{1}$ and $A_2$ $A_{2}$ are pair-wise independent (statistically independent) if and only if
- $P(A_1 \cap A_2) = P(A_1)P(A_2)$
mutually independent:
- $P(A_1 \cap A_2 \cap ... \cap A_n) = P(A_1)P(A_2)...P(A_n)$
Note the difference between independent and mutually exclusive
- mutually exclusive: $cov(X,Y) = 0$
- independent: $P(X,Y) = P(X)P(Y)$
Let $A$ $A$ and $B$ $B$ be events with $P(B) > 0$ $P (B) > 0$ . The conditional probability of A given $B$ $B$ , denoted by $P(A|B)$ $P (A ∣ B)$ , is defined as
- $P(A|B) = \frac{P(A\cap B)}{P(B)}$
Total probability theorem:
- $P(A) = P(B_1) P(A|B_1) + P(B_2) P(A|B_2) + P(B_3) P(A|B_3)$
- $P(A) = \sum\limits_{i=1}^{n}P(B_i) {(A|B_i)}$
Bayes' Theorem
- $P(B_i|A) = \frac{P(A|B_i)P(B_i)}{\sum\limits_{k=1}^{n}P(A|B_k)P(B_k)}$

Counting

inclusion-exclusion
- $|A\cup B| = |A| + |B| - |A\cap B|$
Permutations and combinations
- $P(n,m) = \frac{n!}{(n-m)!}$
- $C(n,m) = \frac{n!}{m!(n-m)!}$

Random Variable $_{lec2}$

A random variable (r.v.) X is a function from sample space of an experiment to the set of real numbers in R:
- $\forall w\in \Omega, X(w) = x \in R$
Note that a random variable is a function, and not a variable, and not random.

Cumulative distribution function

The cdf of a r.v denoted by $F_x(X)$ $F_{x} (X)$ is defined by :
- $F_X(x) = P_X(X\le x)$
$\lim _{x \rightarrow -\infty} = 0$
$\lim _{x \rightarrow \infty} = 1$
$F(x)$ is nondecreasing function of $x$
$F(x)$ is right-continuous
two r.v.s that are identically distributed are not necessarily equal.

Probability mass function

The pmf of a discrete r.v. $X$ is given by $f_X(x) = P(X = x)$

Probability density function

The probability density function or pdf, $f_X(x)$ $f_{X} (x)$ , of a continuous r.v. $X$ $X$ is the function that satisfies:
- $F_X(x) = \int_{- \infty}^x f_X(t) dt$
$X$ has a distribution given by $F_X (x)$ is abbreviated symbolically by $X \sim F_X (x)$ or $X \sim f_X (x)$ .

Joint distribution $_{lec3}$

$P((X,Y)\in A) = \sum_{(x,y)\in A} f(x,y)$
$P((X,Y)\in A) = \int \int_A f(x,y)dxdy$
$f_X(x) = \int_{-\infty}^{+\infty}f_{X,Y}(x,y)dy$
$\frac{\partial ^2F(x,y)}{\partial x\partial y} = f(x,y)$
$f(x|y) = \frac{f(x,y)}{f_Y(y)}$
if $f(x,y) = f_X(x)f_Y(y)$ $f (x, y) = f_{X} (x) f_{Y} (y)$ , then $X,Y$ $X, Y$ are independent.
- 若变量可分离，则不需要计算边际分布，直接可判断相互独立

Bivariate function

$(X,Y)$ be a bivariate r.v, consider a new bivariate r.v $(U,V)$ , define by $U = g_1(X,Y)$ and $V = g_2(X,Y)$

Transformation of discrete

$B = \{(u,v) | u = g_1(x,y), v=g_2(x,y) ,(x,y) \in A \}$
$A_{uv} = \{ (x,y)\in A | u = g_1(x,y), v=g_2(x,y) \}$
$f_{u,v} = P(I = u,V= v) = P((X,Y) \in A_{uv}) = \sum_{(x,y)\in A_{uv} } f_{X,Y}(x,y)$

Transformation of continuous

$J=\left|\begin{array}{ll} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{array}\right|$

$f_{u,v} = f_{X,Y}(h_1(u,v),h_2(u,v))|J|$
这是用反函数求解的方法，若有些题无法用反函数求解，则使用累计密度函数带入计算

Expectation & covariance $_{lec4}$

Expectation value

denoted as $R(g(X))$ :

$\begin{aligned} &E(g(X))=\int_{-\infty}^{+\infty} g(x) f_{X}(x) \text { if } X \text { is continuous }\\ &=\sum_{x \in X} g(x) P(X=x) \text { if } \mathrm{X} \text { is discrete } \end{aligned}$

note: expectation is not always exist
- Cauchy r.v, the pdf:
  - $f_X(x) = \frac{1}{\pi (1+x^2)}$
- $E(X) = \infty$

Linearity of expectations

$E(ag_1(X) + bg_2(X) + c ) = aE(g_1(X)) + bE(g_2(X)) + c$
if $a \le g_1(x) \le b$ for all $x$ , then $a\le E(g_1(X)) \le b$

Uniform exponential relationship

can use uniform distribution to form other distribution: exponential, normalization, which is actually do in computer
suppose $X \sim U(0,1)$ $X \sim U (0, 1)$ ,let $Y = g(X) = -\log X$ $Y = g (X) = - lo g X$
- $F_Y(y) = P(Y\le y) = P(-\log X \le y) = P_X(x\ge e^{-y}) = 1- e^{-y}$
- $f_Y(y) = e^{-y}$
- so $Y \sim \exp(1)$

Moment

For each integer $n$ , the $n-th$ moment of $X$ , is $\mu_n = E(X^n)$
The $n-th$ central moment of $X$ , $\mu_n = E(X - \mu)^n$

Variance

The variance of a r.v. $X$ is its second central moment:
- $var (X) = E(X-\mu)^2$
$var(X) = E(X^2) - (E(X))^2$

Nonlinearity of variance

$var(aX+b) = a^2var(X)$
if $X$ $X$ and $Y$ $Y$ are tow independent r.v.s on a sample space $\Omega$ $Ω$ , then:
- $var(X+Y) = var(X) + var(Y)$

Independence

if $X$ $X$ and $Y$ $Y$ are independent r.v.s on a sample space $\Omega$ $Ω$ , then:
- $E(XY) = E(X)E(Y)$
- $var(X+Y)= var(X)+var(Y)$
- $var(X-Y) = var(X) +var(Y)$

Moment Generating Function

can be used to calculate moment
the moment generating function of $X$ $X$ , denoted by $M_X(t)$ $M_{X} (t)$ , is:
- $M_X(t) = E(e^{tX})$
$M_{aX+b}(t) =e^{bt}E_X(at)$
is applied to Chernoff bound
if the expectation dose not exist, the moment generating function dose not exist.
$X$ is continuous, $M_X(t) = \int _{-\infty}^{+\infty} e^{tx}f_X(x) dx$
$X$ is discrete, $M_X(t) = \sum_xe^{tx}P(X= x)$

Theorem

if $X$ $X$ has moment generating function $M_X(t)$ $M_{X} (t)$ , then :
- $E(X^n) = M_n^{(n)}(0)$
where we define:
- $M_X^{(n)}(0) = \frac{d^n}{dt^n} M_X(t) | _{t=0}$
can be used to calculate Gamma $E(X)$

Property

$M_{aX+b}(t) = e^{bt}M_X(at)$

Covariance

The covariance and correlation of $X$ $X$ and $Y$ $Y$ are the numbers defined by:
- $Cov(X,Y) = E((X-\mu _X)(Y-\mu_Y))$
- $\rho_{XY} = \frac{Cov(X,Y)}{\sigma_X\sigma_Y}$
$Cov(X,Y) = E(XY) - \mu_X\mu_Y$
if $X,Y$ are independent r.v.s, then $Cov(X,Y) = 0$ and $\rho_{XY} = 0$
$Var(aX+bY) = a^2Var(X) + b^2Var(Y) + 2abCov(X,Y)$
相关系数只能说明是否存在线性关系，若等于0，不能说没有关系。
- 但若使用 $\rho(X^2,Y)$ ，也可以衡量。
- 由于任何函数都可以用多项式拟合，因此都可以用相关系数衡量

Bivariate normal pdf

$f(x,y) = (2\pi \rho_X\rho_Y\sqrt{1-\rho^2})^{-1}\cdot \exp(-\frac{1}{2(1-\rho^2)}((\frac{x-\mu_x}{\sigma_X})^2 - 2\rho(\frac{x-\mu_x}{\rho_X})(\frac{y-\mu_Y}{\rho_Y}) + (\frac{y-\mu_Y}{\sigma_Y})^2))$
marginal distribution
- $X\sim N(\mu_X,\sigma_X^2)$
- $Y\sim N(\mu_Y,\sigma_Y^2)$
$\rho = \rho_{XY}$
$aX+bY \sim N(a\mu_X+b\mu_Y,a^2\sigma_X^2 + b^2\sigma_Y^2 + 2ab\rho \sigma_X\sigma_Y)$

conditional expectation $_{lec4}$

Theorem

$E(X) = E(E(X|Y))$ $E (X) = E (E (X ∣ Y))$
- 可以理解为先分组求期望，与直接求期望一样
$Var(X) = E(Var(X|Y)) + Var(E(X|Y))$ $V a r (X) = E (V a r (X ∣ Y)) + V a r (E (X ∣ Y))$
- 可以理解为组内方差的期望 + 组间方差

Mixture distribution

Binomial-Poisson hierarchy

if $X| Y \sim Binomial(Y,P),Y\sim Possion(\lambda)$ :
$P(X=x)= \sum P(X=x,Y=y) = \sum P(X=x|Y=y)P(Y=y) = \frac{(\lambda P)^x}{x!} e^{\lambda P}$
$\therefore X\sim Possion(\lambda P)$
using $E(X) = E(E(X|Y))$ , can easily get $E(X) = E(pY) = p\lambda$

Beta-binomial hierarchy

if $X|P \sim Binomial(n,p),P\sim \beta (\alpha,\beta)$
so $E(X) = E(E(X|P)) = E(np) = \frac{n\alpha}{\alpha + \beta}$

Probability theory review

Basics of Probability Theory $_{lec1}$

The Calculus of Probabilities

Probability operators

Probability properties

Counting

Random Variable $_{lec2}$

Cumulative distribution function

Probability mass function

Probability density function

Joint distribution $_{lec3}$

Bivariate function

Transformation of discrete

Transformation of continuous

Expectation & covariance $_{lec4}$

Expectation value

Linearity of expectations

Uniform exponential relationship

Moment

Variance

Nonlinearity of variance

Independence

Moment Generating Function

Theorem

Property

Covariance

Bivariate normal pdf

conditional expectation $_{lec4}$

Theorem

Mixture distribution

Binomial-Poisson hierarchy

Beta-binomial hierarchy

results matching ""

No results matching ""

Basics of Probability Theorylec1_{lec1}lec1​

The Calculus of Probabilities

Probability operators

Probability properties

Counting

Random Variablelec2_{lec2}lec2​

Cumulative distribution function

Probability mass function

Probability density function

Joint distributionlec3_{lec3}lec3​

Bivariate function

Transformation of discrete

Transformation of continuous

Expectation & covariancelec4_{lec4}lec4​

Expectation value

Linearity of expectations

Uniform exponential relationship

Moment

Variance

Nonlinearity of variance

Independence

Moment Generating Function

Theorem

Property

Covariance

Bivariate normal pdf

conditional expectationlec4_{lec4}lec4​

Theorem

Mixture distribution

Binomial-Poisson hierarchy

Beta-binomial hierarchy

results matching ""

No results matching ""

Basics of Probability Theory $_{lec1}$

Random Variable $_{lec2}$

Joint distribution $_{lec3}$

Expectation & covariance $_{lec4}$

conditional expectation $_{lec4}$